Code
library(needs)
needs(igraph,
igraphdata,
netseg)
data("UKfaculty")Theoretical Insights and it’s Measurements
Welcome to this session of the seminar CSS!
Different theories on roles and conflicts between roles draw on this idea, that society exists were different individuals interact with each other. Economic, social, and political actions are embedded in social networks and relationships. This means that individuals and organizations are not isolated actors, but are instead part of a larger web of social connections that shape their behavior and decision-making.
Measures of reciprocity are based on the dyad census: Any pair of vertices in a network is called a dyad. There are three possible states of dyads (in a directed network): null dyads (no edge), b) asymmetric dyads (one directed edge) and c) mutual dyads (two directed edges).
null <- graph_from_literal(1, 2)
as <- graph_from_literal(1-+2)
mutual <- graph_from_literal(1-+2, 2-+1)
g <- graph_from_literal(1-+2, 2-+3, 2+-3, 5-+2, 3--+5, 5-+3, 3-+4)
layout_line <- cbind(1:2, rep(1, 2))
plot(null,
vertex.color = "lavender",
vertex.size = 30,
layout = layout_line)
plot(as,
vertex.color = "lavender",
vertex.size = 30,
layout = layout_line)
plot(mutual,
vertex.color = "lavender",
vertex.size = 30,
layout = layout_line)
# Plotte das Netzwerk mit Lavender vertices
plot(g,
vertex.color = "lavender",
vertex.size = 30,
edge.width = 2,
arrow.size = 0.1,
layout = layout_with_fr)| null | asymmetric | mutual |
|---|---|---|
| 5 | 3 | 2 |
The level of reciprocity shows whether a network mainly consists of mutual or asymetrical edges and can be calculated in R using either ‘default’-mode (number of reciprocated edges divided by the total number of edges) or ‘ratio’-mode (number of mutual dyads divided by the number of asymmetric dyads).
Describing the relationships of nodes draws back on mathematical relations. The following table summarizes the most important relations:
| Property | Definition |
|---|---|
| Reflexive | ∀ a ∈ A: (a, a) ∈ R |
| Symmetric | ∀ a, b ∈ A: (a, b) ∈ R ⇒ (b, a) ∈ R |
| Transitive | ∀ a, b, c ∈ A: (a, b) ∈ R ∧ (b, c) ∈ R ⇒ (a, c) ∈ R |
| Asymmetric | ∀ a, b ∈ A: (a, b) ∈ R ⇒ (b, a) ∉ R |
| Antisymmetric | ∀ a, b ∈ A: (a, b) ∈ R ∧ (b, a) ∈ R ⇒ a = b |
| Irreflexive | ∀ a ∈ A: (a, a) ∉ R |
| Complete | ∀ a, b ∈ A: (a, b) ∈ R ∨ (b, a) ∈ R |
Any set of 3 vertices in a network is called a triad. In directed networks, 16 different triads exist (Davis and Leinhardt 1967).
# Define all 16 triad types using adjacency matrices
triad_matrices <- list(
matrix(c(0,0,0, 0,0,0, 0,0,0), nrow=3, byrow=TRUE), # 003
matrix(c(0,1,0, 0,0,0, 0,0,0), nrow=3, byrow=TRUE), # 012
matrix(c(0,1,0, 1,0,0, 0,0,0), nrow=3, byrow=TRUE), # 102
matrix(c(0,1,1, 0,0,0, 0,0,0), nrow=3, byrow=TRUE), # 021D
matrix(c(0,0,0, 1,0,0, 1,0,0), nrow=3, byrow=TRUE), # 021U
matrix(c(0,1,0, 0,0,1, 0,0,0), nrow=3, byrow=TRUE), # 021C
matrix(c(0,1,0, 1,0,0, 1,0,0), nrow=3, byrow=TRUE), # 111D
matrix(c(0,1,0, 1,0,1, 0,0,0), nrow=3, byrow=TRUE), # 111U
matrix(c(0,1,1, 0,0,0, 0,1,0), nrow=3, byrow=TRUE), # 030T
matrix(c(0,1,0, 0,0,1, 1,0,0), nrow=3, byrow=TRUE), # 030C
matrix(c(0,1,1, 1,0,0, 1,0,0), nrow=3, byrow=TRUE), # 201
matrix(c(0,1,1, 0,0,1, 0,1,0), nrow=3, byrow=TRUE), # 120D
matrix(c(0,0,0, 1,0,1, 1,1,0), nrow=3, byrow=TRUE), # 120U
matrix(c(0,1,1, 0,0,1, 1,0,0), nrow=3, byrow=TRUE), # 120C
matrix(c(0,0,1, 1,0,1, 1,1,0), nrow=3, byrow=TRUE), # 210
matrix(c(0,1,1, 1,0,1, 1,1,0), nrow=3, byrow=TRUE) # 300 (mutual all)
)
triad_names <- c("003", "012", "102", "021D", "021U", "021C", "111D", "111U", "030T", "030C",
"201", "120D", "120U", "120C", "210", "300")
# Plot each triad
par(mfrow = c(4, 4), mar = c(1, 1, 2, 1)) # 4x4 grid layout
for (i in 1:16) {
adj <- triad_matrices[[i]]
g <- graph_from_adjacency_matrix(adj, mode = "directed")
plot(
g,
main = triad_names[i],
vertex.color = "lavender",
vertex.size = 30,
vertex.label = NA,
vertex.label.cex = 0,
edge.arrow.size = 0.3,
layout = layout_in_circle
)
}The codes like 102, 021C, 120U may feel cryptic at first, but they follow a logical pattern:
** Basic structure: Three-digit code
Example: ‘102’
| Digit | Meaning |
|---|---|
1 |
1 mutual (reciprocal) relationship (e.g., A ↔︎ B) |
0 |
0 one-way relationships |
2 |
2 node pairs have no relationship |
Add-On lettering:
For more complex cases, a letter is added to describe the triad’s shape or direction:
| Code | Description |
|---|---|
021D |
Two one-way edges going down (A → B → C) |
021U |
Two one-way edges going up (C → B → A) |
021C |
A circle of one-way edges (A → B → C → A) |
030T |
A transitive triangle (A → B → C and A → C) |
030C |
A cycle with three one-way edges (A → B → C → A) |
120D |
One mutual edge, two one-way edges in a downward direction |
120U |
One mutual edge, two one-way edges in an upward direction |
120C |
A circle-like pattern with mutual + one-way ties |
210 |
Two mutual dyads and one one-way edge |
300 |
All three dyads are mutual (fully connected triad) |
We can compute the triad census of a network using
In undirected networks only 4 distinct triads exist.
# Define the 4 undirected triad types
undirected_triad_matrices <- list(
matrix(c(0,0,0, 0,0,0, 0,0,0), nrow=3, byrow=TRUE), # 0 edges: Empty (Type: 003)
matrix(c(0,1,0, 1,0,0, 0,0,0), nrow=3, byrow=TRUE), # 1 edge: One dyad (Type: 102)
matrix(c(0,1,1, 1,0,0, 1,0,0), nrow=3, byrow=TRUE), # 2 edges: Open triad (Type: 201)
matrix(c(0,1,1, 1,0,1, 1,1,0), nrow=3, byrow=TRUE) # 3 edges: Complete triad (Type: 300)
)
undirected_triad_names <- c("003", "102", "201", "300")
# Plot each undirected triad
par(mfrow = c(2, 2), mar = c(1, 1, 2, 1)) # 2x2 grid layout
for (i in 1:4) {
adj <- undirected_triad_matrices[[i]]
graph <- graph_from_adjacency_matrix(adj, mode = "undirected")
plot(
graph,
main = undirected_triad_names[i],
vertex.color = "lavender",
vertex.label = NA,
vertex.size = 30,
vertex.label.cex = 0,
layout = layout_in_circle
)
}Triadic closure is a concept in social network theory that was first introduced by Georg Simmel. It describes the tendency of two nodes \(A\) and \(B\) to become connected if they share a common neighbor \(C\). It can be used to understand and predict the formation of social ties in network (of course, there are many other factors influencing tie formation).
A clustering coefficient is a measure of the degree to which vertices in a graph tend to cluster together. Evidence suggests that in most social networks, groups tend to create highly clustered subgroups.
Global clustering coefficient or transitivity:
In an undirected graph a connected triple is a path of length 2 (i.e. triad 201). A triangle is a cycle of exactly three nodes (i.e. triad 300). The global clustering coefficient is then defined as
\[ G_g = \frac{(\text{number of triangles} \times 3)}{(\text{number of connected triples})} \]
Heiders balance theory [-heiderPsychologyInterpersonalRelations1958] that explores how individuals perceive and maintain balance in their social relationships.
It involves triads of entities typically represented as a Person \(P\), another Person \(O\), and and Object \(X\).
If \(P\) likes \(X\) but dislikes \(O\), and then learns that \(O\) likes \(X\), then \(P\) will feel uncomfortable and will try to restore balance by either changing their attitude towards \(O\) or \(X\).
Cognitive balance is achieved, when there are three positive links or two negatives with one positive.
A signed graph is a graph where each edge has a sign (positive or negative). The sign of an edge can represent the nature of the relationship between two nodes. For example, in social networks, a positive edge might represent friendship, while a negative edge might represent enmity or dislike.
Cartwright and Harari(1956) extended Heider’s theory into graph theory, modeling people as nodes and their sentiments as signed edges (+ for friendship, − for hostility). They viewed triads as 3-cycles in a signed graph.
A fully signed graph is balanced, if every triad in it is balanced.
Additional assumption: A triad with three negative edges is also considered balanced (Davis 1967).
Then a graph is called groupable if it can be partitioned into two groups such that all edges between the groups are positive and all edges within the groups are negative.
Segregation: network level property-Segregation refers to the extent to which actors and social groups are separated from another. Segregation exists if the share of ties between groups is much less pronounced than the share of in-group ties, or if the groups are fully separated from each other.
The literature does not agree regarding the exact terminology of homophily and segregation. To some authors, homophily refers to both the macro-phenomenon (lack of intergroup relationships) and the micro-mechanism (similarity preferences). To others, homophily refers to the macro-phenomenon and homophilic or homophilious selection refer to the micro-mechanism.
I concur with other researchers on the terminology of homophily (preferences) for the mechanism and segregation for the macro-pattern.
# Check the attributes available in the dataset
vertex_attr_names(UKfaculty)
# Assuming 'Group' is an attribute in the dataset
# Assign different colors to each unique group
group_colors <- rainbow(length(unique(V(UKfaculty)$Group)))
# Create a vector of colors for each vertex based on its group
vertex_colors <- group_colors[V(UKfaculty)$Group]
# Plot the graph with vertex colors based on the 'Group' attribute
plot(UKfaculty, vertex.color = vertex_colors)Denotes the social phenomenon that people tend to be similar to their social contacts.
Birds of a feather - Billie Eilish has the same title as one of the most cited homophily papers by McPherson, Smith-Lovin and Cook (2001), where various studies of homophily were summarised. Homophily is observed in regard to race and ethnicity, gender and age, religion, education, occupation and social class, behaviour, and attitudes and beliefs
Previous research has found evidence of homophily regarding:
Other, secondary types of homophily can reinforce segregation on an attribute. Example: Ethnic segregation can be reinforced by socioeconomic homophily, when ethnic origin (node color) and socioeconomic status (node shape) overlap (e.g., many ethnic minority members have a lower socioeconomic status and many ethnic majority members have a higher socioeconomic status)
For an explanation see (Wimmer and Lewis 2010, 588–94)
Let \(e_{AB}\) be the number of edges between two types of actors \(A\) and \(B\) (e.g. different faculties) and \(m\) be the total numbers of edges in the network. Then homophily exists if \(\frac{e_{AB}{m}\) is significantly lower than the probability of a random connection between two actors from type \(A\) and type \(B\).
We can also use the assortavity coefficient for measurement.
Assortavity coefficients are basically correlation coefficients that (in the case of categorical variables) can be measured by \(e_{i,j}\) being the ratio of edge from actors of type \(i\) to type \(j\)
Let \(e_{ij}\) be the ratio of edges from actors of type \(i\) to type \(j\).
\(a_i = \sum_j e_{ij} \quad \text{(ratio of edges outgoing from actors of type \( i \))}\)
\(b_j = \sum_i e_{ij} \quad \text{(ratio of edges incoming to actors of type \( j \))}\)
The assortativity coefficient ( r ) is defined as:
\[ r = \frac{\sum_{i}e_{ii}-\sum_{i}a_ib_i}{1-\sum_{i}a_ib_i} \]
\(r = 0 \quad \text{if} \quad e_{ii} = a_i b_i \quad \text{(random connections)}\)
\(r = 1 \quad \text{if} \quad \text{only actors of the same type are connected}\)
\(r = - \frac{\sum_i a_i b_i}{1 - \sum_i a_i b_i} \quad \text{if} \quad \text{only actors of different types are connected}\)
or odds-ratios of within-group ties (Bojanowski and Corten 2014)
telling us, that same-group tie odds are 12.866 times greater than tie odds between groups.
or Colemans Index (Coleman 1958) that compares the proportion of same-group neighbours to the proportion of that group in the network as a whole.
which compares the proportion of same-group neighbors to the proportion of that group in the network as a whole. It is a number between -1 and 1. Value of 0 means these proportions are equal. Value of 1 means that all ties outgoing from a particular group are sent to the members of the same group. Value of -1 is the opposite – all ties are sent to members of other group(s).
Measuring homophily preferences is difficult as we cannot directly observe preferences. In the social networks literature, the most common ways to estimate the strength of homophily preferences are Exponential Random Graph Models (ERGMs) and Stochastic Actor-Oriented Models (SAOMs). These estimate the existence of ties in dependence of actor attributes (such as similarity). While we shortly be talking about these models in week 7, I still want to point out the caveat of ERGMs and SAOMs that they rely on a revealed preference assumption and infer preferences from the observed network structure.
Another approach to testing homophily preferences are permutation tests. A permutation is the result of a random reshuffle of the row and column names in an adjacency matrix. This creates a random network with the same structural properties (e.g., reciprocated ties) as the observed network. Who is connected to whom is, however different now. The level of segregation in the permuted network is now solely based on the network composition. If we permute the network a large number of times, we receive a distribution of segregation values expected due to the network composition. Then, we can test whether the observed segregation is statistically significant from the distribution of permutations. A significant difference tells us that preferences are (likely) the reason for the difference between the mean of the permutations and the observation. We, however, do not know which preferences are responsible!
Reflective question: From what stage on, can we say a network is homophile or segregated?
Granovetters paper is one of the most cited papers in sociology (right after Foucault) and social network analysis. It has been cited over 50,000 times and is considered foundational work in the field of social network analysis. The concept of the strength of weak ties has been widely applied in various fields, including sociology, communication studies, and organizational behavior.
Following Granovetter, the strength of ties is measured by the amount of time, emotional intensity, intimacy, and reciprocal services that characterize the tie. Granovetter explicitly differentiates between strong and weak ties.
Networks follow a so called transitivity tendency meaning that if \(A\) is connected to \(B\) and \(C\), then \(B\) and \(C\) are more likely to be (positively) connected to each other than if \(A\) was not connected to either of them. This is also called the triadic closure property of networks. This is due to meeting opportunities, homophily and structural balance.
Following these arguments, Granovetter calls triads, in which \(A\) has strong ties to \(B\) and \(C\), but no tie exists between \(B\) and \(C\), forbidden triads. He argues that these triads are unstable and will be dissolved over time. Empirically forbidden triads occur in core networks.
With this in mind, Granovetter argues that the transitivity tendency a function of strength of ties and not a general property of networks. Going through the mechanisms that drive the transitivity tendency. this becomes more obvious.
Meeting opportunities: If \(A\) has weak ties to \(B\) and \(C\), they less often interact if it where strong ties. This means that the likelihood of \(B\) and \(C\) meeting is lower than if \(A\) has strong ties to both of them.
Homophily: People have a preference to interact with people who are similar to them. If they aren’t relationsships are probably more superficial and instrumental. This means that the likelihood of \(B\) and \(C\) being similar is lower than if \(A\) has strong ties to both of them.
Balance theory: In the case of weak ties, people are less strained by cognitive disbalance.
This is also empirically supported: Core networks of friends have stronger triadic closure than peripheral networks of acquaintances.
The fact that people play different roles and have different influences inside groups and communities has motivated centuries of sociological and psychological research, so it is unsurprising that the concept of vertex importance and influence is of great interest in the study of people or organizational networks.
But importance and influence are not precisely defined concepts, and to make them real within the context of graphs and networks we need to find some sort of mathematical definition for them. In many visual graph layouts, more important or influential vertices that have stronger roles in overall connectivity will usually be positioned toward the center of a group of other vertices. Intuitively therefore, we use the term ‘centrality’ to describe the importance or influence of a vertex in the connected structure of a graph.
# Edgelist from Ona Book
g <- graph_from_literal(1--2, 3--4, 4--1, 4--2, 4--5, 4--6, 6--7, 4--6, 7--8, 8--4, 8--9, 9--10, 10--11, 11--12, 9--13, 13--14, 11--12, 12--10, 7--4)
ecount(g)
V(g)$color <- "lavender"
V(g)$label.color <- "darkgrey"
# Plotten
plot(g,
vertex.size = 20,
vertex.label.cex = 0.8)The degree centralty (or valence) of a vertex \(v\) is the number of edges connected to \(v\). Its thus a measure of immediate connections. Related to the concept of degree is the ego size. The \(n\)th order ego network of a given vertex \(v\) is the set including \(v\) itself and all vertices that are reachable from \(v\) by a path of length \(n\). The size of the \(n\) th order ego network is the number of vertices in it.
Out-degree: the number of edges going out from a vertex.
In-degree: the number of edges going into a vertex.
The closeness centrality of a vertec \(x\) in a connected graph is the inverse of the sum of the distances from \(x\) to all other vertices. Through inverting the distancematrix we make sure, that lower total distances will generate higher closeness centrality scores. In connected graphs is defined as:
\[ C_B(x) = \frac{1}{\sum_{y \in V} d(x,y)} \]
where \(d(x,y)\) is the distance between vertices \(x\) and \(y\).
The betweenness centrality of a vertex \(v\) is the number of shortest paths between all pairs of vertices that pass through \(v\). In connected graphs is defined as:
\[ C_B(v) = \sum_{s. t \in V \\ s \neq v \neq t} \frac{\sigma_{st}(v)}{\sigma_{st}} \]
where \(\sigma_{st}\) is the number of shortest paths between vertices \(s\) and \(t\) and \(\sigma_{st}(v)\) is the number of those paths that pass through \(v\).
The Eigenvector centrality (or prestige) is a measure of how connected a vertex is to other influential vertices in the graph. Relative scores are assigned to each vertex based on the scores of its neighbors. It can be calculated using the adjacency matrix of the graph.
For a given graph \(G\) = (V,E), let \(A = (a_{v,t})\) be the adjacency matrix, i.e. \(a_{v,t} = 1\) if vertex \(v\) is linked to vertex \(t\), and \(a_{v,t} = 0\) otherwise. The relative centrality score, \(x_v\), of vertex \(v\) can be defined as:
\[ {\displaystyle x_{v}={\frac {1}{\lambda }}\sum _{t\in M(v)}x_{t}={\frac {1}{\lambda }}\sum _{t\in V}a_{v,t}x_{t}} \]
where \(M(v)\) is the set of neighbors of vertex \(v\) and \(\lambda\) is a constant. The centrality score of a vertex is proportional to the sum of the centrality scores of its neighbors. The constant \(\lambda\) is the largest eigenvalue of the adjacency matrix.
We have not looked at the impact of edge weights on centrality in this chapter. This is because it is unusual to consider edge weights in centrality measures. However most centrality measures can be adapted to take edge weights into account. This could be an interesting lookout as a subject for your final project.
The local clustering coefficient gives an indication of the of the extend of clustering of a single vertex (How close its neighbors are to being a clique (complete graph).
Let \(G = (V, E)\) be an undirected simple graph1 with \(V\) vertices and \(E\) edges. \(n\) is the total number of vertices in the graph, \(m\) the total number of edges.
The neighborhood \(N_i\) of a vertex \(v_i\) are its immediate neighbors and defined as followed:
\[ N_i = \{v_j : e_{ij} \in E \wedge e_{ji} \in E\} \]
We define \(k_i\) as he number of vertices \(|N_i|\) in the neighborhood of \(v_i\). The local clustering coefficient of a vertex \(v_i\) is the proportion of edges between the vertices within its neighborhood divided by the number of edges that could possibly exist between them.
Thus we define the local clustering coefficient of a vertex \(v_i\) as:
\[ C_i = \frac{\{|e_{jk} : v_j, v_k \in N_i, e_{jk} \in E|\}}{k_i (k_i-1)} \]
where \(e_{ij}\) is the number of edges between the vertices in the neighborhood of \(v_i\) and \(k_i\) is the number of vertices in the neighborhood of \(v_i\).
Which node in the network g has the highest
Think about it first and then check your results using R.
Granovetter’s theory of the strength of weak ties suggests that weak ties (i.e., acquaintances) can be more valuable than strong ties (i.e., close friends) in terms of information flow and access to new opportunities. This is because weak ties often connect individuals to different social groups, providing access to diverse information and resources that may not be available within one’s immediate social circle.
# Define the edges
edges <- c(
1,2, 1,3, 2,3, # Triad 1: nodes 1,2,3 fully connected
4,5, 4,6, 5,6, # Triad 2: nodes 4,5,6 fully connected
3,4 # Weak tie between the two triads
)
# Create the graph
weak <- make_graph(edges, directed = FALSE)
# Plot the graph
plot(weak,
vertex.size=30,
vertex.color = "lavender",
edge.width=c(rep(2,6), 0.5), # Thicker edges within triads, thin edge for weak tie
edge.color=c(rep("black",6), "gray")
) # Force-directed layout for nice separation
# Add a legend
legend("bottomleft",
legend = c("Strong Tie", "Weak Tie"),
col = c("black", "gray"),
lwd = c(2, 1),
bty = "n")The connection between the two triads is a weak tie, while the connections within each triad are strong ties. The weak tie (3-4) allows for information flow between the two otherwise disconnected groups.
Remember last session:
Empirically, bridges are extremely rare in social networks.
This is why we also introduce the notion of local bridges.
Edge {A,B} is a local bridge if its endpoints \(A\) and \(B\) have no contacts in common. So if deleting the edge will increase the distance between \(A\) and \(B\) with more than 2. So it provides their endpoints with access to parts of the network that otherwise would be far away.
Following Granovetters ideas on triadic closure, why must every local bridge be a weak tie?
While most social structures tend to be characterized by dense clusters of strong connections, some individuals can act as mediators between these clusters. These individuals are often referred to as brokers and according to Burt (1995) they can benefit from their position as brokers between two or more groups. Burt calls the space between two groups a structural hole.
Burt introduces the measurement of constraint to measure the degree to which a vertex is constrained by its neighbours.
High constraint = your friends know each other a lot → your network is tight and redundant.
Low constraint = your friends are mostly independent → you can bridge different groups (you have more “social capital”).
\[ C_i = \sum_{j \in V_i \setminus \{i\}} \left( p_{ij} + \sum_{q \in V_i \setminus \{i,j\}} p_{iq} p_{qj} \right)^2 \]
A simple graph is a graph that does not contain multiple edges or loops.↩︎
Social capital
Social Capital as a concept goes back to the work of Pierre Bourdieu (1986). Its is not such a crisply defined term as it may seem. But basically, the concept boils down to the access to resources through the social network. In the most simple terms we could operationalize this as the resources possessed by the (direct) social ties of ego. We can roughly differentiate two dimensions along which major definitions of social capital can be categorized: resources vs. structure, and a focus on the individual capital vs group capital. (Lin 2001) provides a good overview of the general concept.
The structural based approach mainly dates back to (Coleman 1958) who proposed that the social capital should be operationalized as the cohesion of a network or a subgroup. He claimed that cohesion produces trust, predictability and enables cooperation. This draws on Rational Choice Theory and proposes stable, cohesive groups with visibility of each others behavior as a prevention of defection. See network cohesion measures for descriptives.
While addressing a similar problem as Coleman, Robert Putnam (Putnam 2001) claims that groups (and indirectly the individuals in them) need civic engagement. As such, this approach focuses less on the structural properties of the network and more on what people do/ bring into the network. Hence the label “resource-based approach”. A simple way to do this is to calculate average levels of, e.g., voter turnout or participation in civic organizations per group or sub-group.
The structural approach on the individual level relates mainly to the works of Mark Granovetter (1973) and Ronald Burt (1995). Granovetter’s work on the strength of weak ties and Burt’s work on structural holes are two of the most influential theories in social network analysis. Granovetter’s theory suggests that weak ties (acquaintances) can be more valuable than strong ties (close friends) in terms of information flow and access to new opportunities. Burt’s theory emphasizes the importance of being a broker or connector between different groups in a network, which can lead to greater access to resources and information.